Auto-correlation Dependent Bounds for Relational Data
نویسنده
چکیده
A large portion of the data that is collected in various application domains such as online social networking, finance, biomedicine, etc. is relational in nature. A subfield of Machine Learning namely; Statistical Relational Learning (SRL) is concerned with performing statistical inference on relational data. A defining property of relational data that separates it from independently and identically distributed data (i.i.d.) is the existence of correlations between individual datapoints. A major portion of the theory developed in machine learning assumes the data is i.i.d. In this paper we develop theory for the relational setting. In particular, we derive distribution free bounds for the relational setting where the class of data generation models we consider are inspired from the type joint distributions that are represented by relational classification models developed by the SRL community. A key aspect of the bound we derive is that the tightness of the bound is a function of the strength of dependence between related datapoints, with the bound reducing to the standard Hoeffding’s or McDiarmid’s inequality when there is no dependence. To the best of our knowledge this is the first bound for relational data whose tightness varies with the strength of dependence.
منابع مشابه
Collective vs Independent Classification in Statistical Relational Learning
Statistical Relational Learning (SRL) addresses the problem of performing probabilistic inference on data instances that are correlated. Collective classification is an important SRL task, in which related data instances are classified simultaneously as opposed to independently which is done in independent Machine Learning. In several studies conducted in the last decade, it has been shown that...
متن کاملRelationship between Environmental Quality and Economic Growth in Developing Countries (based on Environmental Performance Index)
In order to evaluate the development levels of countries, economic growth along with environmental quality account for important indices nowadays. The impacts of environmental quality (based on environmental performance index), the direct foreign investment, and trade openness on economic growth in selected developing countries have been scrutinized in the present study. In the present study th...
متن کاملپرخاشگری رابطهای در کودکان پیشدبستانی
AbstractObjectives: This study aimed to investigate relational aggression in the preschool children in Shiraz as it causes harmful events for both the aggressive child and the other children. Method: In a descriptive cross-sectional survey, 258 children (119 boys, 139 girls) aged 3 to 7 years completed a 10-itemed questionnaire in the field of relational aggression for preschool children-teache...
متن کاملAutocorrelation and Linkage Cause Bias in Evaluation of Relational Learners
Two common characteristics of relational data sets — concentrated linkage and relational auto-correlation — can cause traditional methods of evaluation to greatly overestimate the accuracy of induced models on test sets. We identify these characteristics, define quantitative measures of their severity, and explain how they produce this bias. We show how linkage and autocorrelation affect estima...
متن کاملEfficiently Processing of Top-k Typicality Query for Structured Data
This work presents a novel ranking scheme for structured data. We show how to apply the notion of typicality analysis from cognitive science and how to use this notion to formulate the problem of ranking data with categorical attributes. First, we formalize the typicality query model for relational databases. We adopt Pearson correlation coefficient to quantify the extent of the typicality of a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013